Search CORE

530 research outputs found

A User's Guide to the Encyclopedia of DNA Elements (ENCODE)

Author: The ENCODE Project Consortium
Publication venue
Publication date: 01/01/2011
Field of study

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome

Carolina Digital Repository

Hardy-Weinberg Equilibrium Testing of Biological Ascertainment for Mendelian Randomization Studies

Author: Cupples
Davey-Smith
Gu
Guarnieri
Hardy
Hedrick
Hingorani
Ian N. M. Day
Kavvoura
Santiago Rodriguez
The ENCODE Project Consortium
The International HapMap Consortium
The Wellcome Trust Case Control Consortium
Tom R. Gaunt
Weinberg
Publication venue: Oxford University Press
Publication date: 15/02/2009
Field of study

Mendelian randomization (MR) permits causal inference between exposures and a disease. It can be compared with randomized controlled trials. Whereas in a randomized controlled trial the randomization occurs at entry into the trial, in MR the randomization occurs during gamete formation and conception. Several factors, including time since conception and sampling variation, are relevant to the interpretation of an MR test. Particularly important is consideration of the “missingness” of genotypes that can be originated by chance, genotyping errors, or clinical ascertainment. Testing for Hardy-Weinberg equilibrium (HWE) is a genetic approach that permits evaluation of missingness. In this paper, the authors demonstrate evidence of nonconformity with HWE in real data. They also perform simulations to characterize the sensitivity of HWE tests to missingness. Unresolved missingness could lead to a false rejection of causality in an MR investigation of trait-disease association. These results indicate that large-scale studies, very high quality genotyping data, and detailed knowledge of the life-course genetics of the alleles/genotypes studied will largely mitigate this risk. The authors also present a Web program (http://www.oege.org/software/hwe-mr-calc.shtml) for estimating possible missingness and an approach to evaluating missingness under different genetic models

Crossref

PubMed Central

Explore Bristol Research

Modeling associations between genetic markers using Bayesian networks

Author: Altshuler
Browning
C. D. Maciel
E. Villanueva
Liu
Mueller
Nothnagel
Pritchard
Scheet
The ENCODE Project Consortium
Thomas
Thomas
Tishkoff
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging

Crossref

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

The wave of next‐generation sequencing data has arrived. However, many questions still remain about how to best analyze sequence data, particularly the contribution of rare genetic variants to human disease. Numerous statistical methods have been proposed to aggregate association signals across multiple rare variant sites in an effort to increase statistical power; however, the precise relation between the tests is often not well understood. We present a geometric representation for rare variant data in which rare allele counts in case and control samples are treated as vectors in Euclidean space. The geometric framework facilitates a rigorous classification of existing rare variant tests into two broad categories: tests for a difference in the lengths of the case and control vectors, and joint tests for a difference in either the lengths or angles of the two vectors. We demonstrate that genetic architecture of a trait, including the number and frequency of risk alleles, directly relates to the behavior of the length and joint tests. Hence, the geometric framework allows prediction of which tests will perform best under different disease models. Furthermore, the structure of the geometric framework immediately suggests additional classes and types of rare variant tests. We consider two general classes of tests which show robustness to noncausal and protective variants. The geometric framework introduces a novel and unique method to assess current rare variant methodology and provides guidelines for both applied and theoretical researchers.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/97460/1/gepi21722.pd

Crossref

Dordt College

PubMed Central

Deep Blue Documents at the University of Michigan

Genome-wide associations of gene expression variation in humans

Author: Andrew G Clark
Barbara E Stranger
Brenda Kahl
David Allison
Emmanouil T Dermitzakis
ENCODE Project Consortium
Mark J Minichiello
Matthew S Forrest
Panagiotis Deloukas
Robert Lyle
Samuel Deutsch
Sarah Hunt
Simon Tavaré
Stylianos E Antonarakis
The International HapMap Consortium
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 01/01/2005
Field of study

The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12-13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

FigShare

g:Profiler—a web server for functional interpretation of gene lists (2011 update)

Author: Adler
Adler
Al-Shahrour
Antonov
Ashburner
Barrell
Benschop
Berriz
Billon
Doniger
Flicek
Gennarino
Green
Griffiths-Jones
Hamosh
Hatzis
Huang
Huang
Jaak Vilo
Jung
Jüri Reimand
Kahlem
Kanehisa
Kersey
Khatri
Krushevskaya
Kull
Lander
Matthews
Matys
McCarthy
McCarthy
Narsai
Parkinson
Reimand
Reimand
Reimand
Robinson
Sardiello
Schulz
Sealfon
Stark
Stegle
Tambet Arak
The 1000 Genomes Project Consortium
The ENCODE Project Consortium
The International Cancer Genome Consortium
Tretyakov
Vooder
Yagi
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Functional interpretation of candidate gene lists is an essential task in modern biomedical research. Here, we present the 2011 update of g:Profiler (http://biit.cs.ut.ee/gprofiler/), a popular collection of web tools for functional analysis. g:GOSt and g:Cocoa combine comprehensive methods for interpreting gene lists, ordered lists and list collections in the context of biomedical ontologies, pathways, transcription factor and microRNA regulatory motifs and protein–protein interactions. Additional tools, namely the biomolecule ID mapping service (g:Convert), gene expression similarity searcher (g:Sorter) and gene homology searcher (g:Orth) provide numerous ways for further analysis and interpretation. In this update, we have implemented several features of interest to the community: (i) functional analysis of single nucleotide polymorphisms and other DNA polymorphisms is supported by chromosomal queries; (ii) network analysis identifies enriched protein–protein interaction modules in gene lists; (iii) functional analysis covers human disease genes; and (iv) improved statistics and filtering provide more concise results. g:Profiler is a regularly updated resource that is available for a wide range of species, including mammals, plants, fungi and insects

CiteSeerX

Crossref

PubMed Central

Mapping the <i>Shh</i> long-range regulatory domain

Author: Amano
Belloni
Bickmore
Chuong
Davis
Dixon
Echelard
Epstein
Hecksher-Sorensen
Jeong
Jeong
Klopocki
Kokubu
Lettice
Lettice
Lettice
Lettice
Lettice
Lettice
Liu
Marinić
Mates
Montavon
Nagy
Niedermaier
Osoegawa
Paek
Riddle
Ruf
Sagai
Sagai
Sagai
Sharpe
Sharpe
Shen
Smallwood
Spitz
Sun
Symmons
Symmons
The ENCODE Consortium Project
Tsukiji
Publication venue: 'The Company of Biologists'
Publication date: 01/10/2014
Field of study

Coordinated gene expression controlled by long-distance enhancers is orchestrated by DNA regulatory sequences involving transcription factors and layers of control mechanisms. The Shh gene and well-established regulators are an example of genomic composition in which enhancers reside in a large desert extending into neighbouring genes to control the spatiotemporal pattern of expression. Exploiting the local hopping activity of the Sleeping Beauty transposon, the lacZ reporter gene was dispersed throughout the Shh region to systematically map the genomic features responsible for expression activity. We found that enhancer activities are retained inside a genomic region that corresponds to the topological associated domain (TAD) defined by Hi-C. This domain of approximately 900 kb is in an open conformation over its length and is generally susceptible to all Shh enhancers. Similar to the distal enhancers, an enhancer residing within the Shh second intron activates the reporter gene located at distances of hundreds of kilobases away, suggesting that both proximal and distal enhancers have the capacity to survey the Shh topological domain to recognise potential promoters. The widely expressed Rnf32 gene lying within the Shh domain evades enhancer activities by a process that may be common among other housekeeping genes that reside in large regulatory domains. Finally, the boundaries of the Shh TAD do not represent the absolute expression limits of enhancer activity, as expression activity is lost stepwise at a number of genomic positions at the verges of these domains

Crossref

PubMed Central

Edinburgh Research Explorer

A probabilistic generative model for GO enrichment analysis

Author: Alexa
Bader
Bar-Joseph
Cheung
Davis
Deutscher
Eisen
Ernst
Ernst
Ewing
Gasch
Gerard J. Nau
Giot
Grassme
Grossmann
Harbison
Ihmels
Itamar Simon
Jones
Kellis
Leem
Mewes
Mukherjee
Nasmyth
Natarajan
Nau
Navarre
Palomero
Park
Ren
Rojas
Roni Rosenfeld
Spellman
The ENCODE Project Consortium.
The Gene Ontology Consortium.
The Toxicogenomics Research Consortium.
Thomas
Yong Lu
Ziv Bar-Joseph
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods

Crossref

PubMed Central